Trellis quants are based on a novel integer trellis rather than the scalar or block-based schemes used by other quant families. The integer trellis formulation enables reasonable CPU performance even at very low bits per weight — an unusual property at these compression levels.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ikawrakow/ik_llama.cpp/llms.txt
Use this file to discover all available pages before exploring further.
Available types
| Type | Bits per weight | Notes |
|---|---|---|
IQ1_KT | ~1 | Extreme compression; quality highly dependent on model and imatrix |
IQ2_KT | ~2 | Aggressive compression; practical for very large models |
IQ3_KT | ~3 | Better quality retention than IQ2_KT at moderate size increase |
IQ4_KT | ~4 | Closest to standard 4-bit quality within the trellis family |
Platform support
| Backend | Supported |
|---|---|
| CUDA | Yes |
| Metal | Yes |
| ARM NEON | Yes |
| CPU (AVX2) | Yes |
ROCm and Vulkan backends are not actively maintained. See the main README for details.
When to use trellis quants
Trellis quants are the right choice when memory constraints are severe and other options do not fit:- Very large models (70B+) where even IQ2_K does not fit in available memory
- Situations where you need the smallest possible file at a given quality floor
- Deployments on hardware where 1–2 BPW is the only viable option
Tradeoffs vs IQK quants
| IQK quants | Trellis quants | |
|---|---|---|
| Quality at same BPW | Higher | Lower |
| File size at same BPW | Larger | Smaller |
| CPU performance | Good | Reasonable (novel integer trellis design) |
| Lowest available BPW | ~2 (IQ2_K) | ~1 (IQ1_KT) |
Quantizing a model
--custom-q and --dry-run options available for IQK quants also work with trellis types. See the IQK quants page for usage details.